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Abstract 

We  develop  a  novel  mechanism  for  coordinated,  distributed  multiagent  planning.  We  consider 
problems  stated  as  a  collection  of  single-agent  planning  problems  coupled  by  common  soft  con¬ 
straints  on  resource  consumption.  (Resources  may  be  real  or  fictitious,  the  latter  introduced  as  a 
tool  for  factoring  the  problem).  A  key  idea  is  to  recast  the  distributed  planning  problem  as  learn¬ 
ing  in  a  repeated  game  between  the  original  agents  and  a  newly  introduced  group  of  adversarial 
agents  who  influence  prices  for  the  resources.  The  adversarial  agents  benefit  from  arbitrage :  that 
is,  their  incentive  is  to  uncover  violations  of  the  resource  usage  constraints  and,  by  selfishly  ad¬ 
justing  prices,  encourage  the  original  agents  to  avoid  plans  that  cause  such  violations.  If  all  agents 
employ  no-extemal-regret  learning  algorithms  in  the  course  of  this  repeated  interaction,  we  are 
able  to  show  that  our  mechanism  can  achieve  design  goals  such  as  social  optimality  (efficiency), 
budget  balance,  and  Nash-equilibrium  convergence  to  within  an  error  which  approaches  zero  as 
the  agents  gain  experience.  In  particular,  the  agents’  average  plans  converge  to  a  socially  optimal 
solution  for  the  original  planning  task.  We  present  experiments  in  a  simulated  network  routing 
domain  demonstrating  our  method’s  ability  to  reliably  generate  sound  plans. 


1  Introduction 


In  this  work,  we  develop  a  novel,  distributed  multiagent  planning  mechanism.  Our  mechanism 
coordinates  the  different,  individual  goals  of  participating  agents  P±,...,Pk  to  achieve  a  globally 
desirable  plan.  While  the  agents  could  in  principle  compute  the  optimal  global  plan  in  a  centralized 
manner,  distributed  approaches  can  improve  robustness,  fault  tolerance,  scalability  (both  in  prob¬ 
lem  complexity  and  in  the  number  of  agents),  and  flexibility  in  changing  environments  [SD99]. 
We  consider  multiagent  planning  problems  stated  as  k  €  N  single-agent  convex  optimization 
problems  that  are  coupled  by  n  e  N  linear,  soft  constraints  with  positive  coefficients.  (By  a  soft 
constraint,  we  mean  that  violations  are  feasible,  but  are  penalized  by  a  convex  loss  function  such 
as  a  hinge  loss.)  This  representation  includes,  for  example,  network  routing  problems,  in  which 
each  agent’s  feasible  region  represents  the  set  of  paths  from  a  source  to  a  sink,  its  objective  is  to 
find  a  low-latency  path,  and  a  soft  constraint  represents  the  additional  delay  caused  by  congestion 
on  a  link  used  by  multiple  agents. 

Since  all  coupling  constraints  (also  referred  to  as  inter-agent  constraints )  are  soft,  each  agent’s 
feasible  region  does  not  depend  on  the  actions  of  the  other  agents.1  So,  the  agents  could  plan 
independently  and  be  guaranteed  that  their  joint  actions  would  be  feasible.  This  interaction  is  a 
convex  game  [SL07]:  each  player  simultaneously  selects  her  plan  from  a  convex  set,  and  if  we  hold 
all  plans  but  one  fixed,  the  remaining  player’s  loss  is  a  convex  function  of  her  plan.  By  playing 
this  convex  game  repeatedly,  the  players  could  learn  about  one  another’s  behavior  and  adjust  their 
actions  accordingly.  With  appropriate  learning  algorithms  they  could  even  ensure  convergence  to 
an  equilibrium  [BHLR07,  GGMZ07].  Unfortunately,  while  distributed  and  robust,  this  naive  setup 
can  lead  to  highly  suboptimal  global  behavior:  a  selfish  agent  which  can  gain  any  benefit  from 
using  a  congested  link  will  do  so,  even  if  the  resulting  cost  to  other  agents  would  far  outweigh  its 
own  gain  [Rou07]. 

To  overcome  this  problem,  a  key  idea  of  our  approach  is  to  introduce  additional  adversarial  agents 
A\, ...,  An,  each  of  which  can  influence  the  cost  of  one  of  the  resources  by  collecting  usage  fees. 
Like  the  original  agents,  the  new  agents  are  self-interested.  But,  collectively,  they  encourage  the 
original  agents  to  avoid  excessive  resource  usage:  we  show  below  how  to  define  their  revenue 
functions  so  that  they  effectively  perform  arbitrage,  allocating  extra  constraint-violation  costs  to 
each  original  agent  in  proportion  to  its  responsibility  for  the  violations. 

The  way  we  define  the  adversarial  agents’  revenues  and  payments  effectively  decouples  the  in¬ 
dividual  planning  problems:  an  original  agent  perceives  the  other  original  agents’  actions  only 
through  their  effects  on  the  choices  of  the  adversarial  agents.  So,  under  our  mechanism,  the  origi¬ 
nal  agents  never  need  to  communicate  with  one  another  directly.  Instead,  they  communicate  with 
the  adversarial  agents  to  find  out  prices  and  declare  demands  for  the  resources  they  need  to  use. 
If  an  original  agent  never  uses  a  particular  resource,  it  never  needs  to  send  messages  to  the  cor¬ 
responding  adversarial  agent.  This  decoupling  can  greatly  reduce  communication  requirements 
and  increase  robustness  compared  to  the  centralized  solution:  a  central  planner  must  communicate 
with  every  agent  on  every  time  step,  and  so  constitutes  a  choke  point  for  communication  as  well  as 

'For  a  treatment  on  how  our  mechanism  can  be  used  in  settings  where  inter-agent  constraints  are  hard  refer  to 
Appendix  E. 
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a  single  point  of  failure. 

Because  we  have  decoupled  the  agents  from  one  another,  each  individual  agent  no  longer  needs 
to  worry  about  the  whole  planning  problem.  Instead,  it  only  has  to  solve  a  local  online  convex 
problem  (OCP)  [Gor99,  Zin03]  independently  and  selfishly.  To  solve  its  OCP,  an  agent  could 
use  any  learning  algorithm  it  desired;  but,  in  this  paper,  we  explore  what  happens  if  the  agents 
use  no-regret  learning  algorithms  such  as  Greedy  Projection  [Zin03].  No-regret  algorithms  are  a 
natural  choice  for  agents  in  a  multi-player  game,  since  they  provide  performance  guarantees  that 
other  types  of  algorithms  do  not.  And,  as  we  will  show,  if  all  agents  commit  to  no-regret  learning, 
several  desirable  properties  result. 

More  specifically,  if  each  agent’s  average  per-iteration  regret  approaches  zero,  the  agents  will  learn 
a  globally  optimal  plan,  in  two  senses:  first,  the  average  per-iteration  cost ,  summed  over  all  of  the 
agents,  will  converge  to  the  optimal  social  cost  for  the  original  planning  problem.  And  second, 
the  average  overall  plan  of  the  original  agents  will  converge  to  a  socially  optimal  solution  of  the 
original  planning  problem. 

These  two  results  lead  us  to  propose  two  different  mechanism  variants:  in  the  online  setup,  mo¬ 
tivated  by  the  first  result,  learning  takes  place  online  in  the  classic  sense;  all  agents  choose  and 
execute  their  plans  and  make  and  receive  payments  in  every  learning  iteration.  By  contrast,  in 
the  negotiation  setup,  motivated  by  the  second  result,  the  agents  learn  and  plan  as  usual,  but  only 
simulate  the  execution  of  their  chosen  joint  plan.  The  simulated  results  (costs  or  rewards)  from 
this  proposed  joint  plan  provide  feedback  for  the  learners.  After  a  sufficient  number  of  learning  it¬ 
erations,  each  agent  averages  together  all  of  its  proposed  plans  and  executes  the  average  plan.  One 
can  interpret  either  setup,  but  the  negotiation  setup  in  particular,  as  a  very  simple  auction  where 
the  goods  are  resources.  A  plan  which  consumes  a  resource  is  effectively  a  bid  for  that  resource; 
the  resource  prices  are  determined  by  the  agents’  learning  behavior  in  response  to  the  bids. 

Just  as  with  any  mechanism,  because  our  agents  are  selfish,  we  need  to  consider  the  impact  of  in¬ 
dividual  incentives.  Focusing  on  the  negotiation  setup,  we  provide  (mainly  asymptotic)  guarantees 
of  Nash-equilibrium  convergence  of  the  overall  learning  outcome  as  well  as  classic  mechanism 
design  goals  such  as  budget  balance,  individual  rationality,  and  efficiency. 

This  document  is  a  long  version  of  material  that  appeared  in  the  proceedings  of  the  7th  Inter¬ 
national  Conference  of  Autonomous  Agents  and  Multiagent  Systems  (AAMAS  2008)  [CG08]  and 
contains  proofs  that  had  to  be  omitted  in  the  conference  publication. 


2  Preliminaries 

In  an  online  convex  program  [Gor99,  Zin03],  a  possibly  adversarial  sequence  (T(^)tGn  of  convex 
cost  functions  is  revealed  step  by  step.  (Equivalently,  one  could  substitute  concave  reward  func¬ 
tions.)  At  each  step  t,  the  OCP  algorithm  must  choose  a  play  X(t)  from  its  feasible  region  F  while 
only  knowing  the  past  cost  functions  T (q)  and  choices  x(r/)  (q  <  t  —  1).  After  the  choice  is  made, 
the  current  cost  function  T(t)  is  revealed,  and  the  algorithm  pays  T(q(x(t)). 

To  measure  the  performance  of  an  OCP  algorithm,  we  can  compare  its  accumulated  cost  up  through 
step  T  to  an  estimate  of  the  best  cost  attainable  against  the  sequence  (T p))t=i ...t-  Here,  we  will 
estimate  the  best  attainable  cost  as  the  cost  of  the  best  constant  play  S(T)  €  F,  chosen  with 
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knowledge  of  Tpj . . .  T (Ty  This  choice  leads  to  a  measure  called  external  regret  or  just  regret : 
R(T)  =  —  Y^=iF(t)(s(T))-  An  algorithm  is  no-(external)-regret  iff  it  guaran¬ 

tees  that  R(T )  grows  slower  than  O(T),  i.e.,  R(T)  <  A (T)  e  o(T).  A(T)  is  a  regret  bound. 
(The  term  no -regret  is  motivated  by  the  fact  that  the  limiting  average  regret  is  no  more  than  zero, 
linr  sup^^  R(T)/T  <  0.) 

We  define  the  convex  conjugate  or  dual[B\04]  of  a  function  T (x)  tobeT*(y)  =  supxgdomr[(x,  y)  — 
T(x)].  The  conjugate  function  T*  is  always  closed  and  convex,  and  if  T  is  closed  and  convex,  then 
T**  =  T  pointwise. 

2.1  Model  and  Notation 

We  wish  to  model  interaction  among  k  player  agents,  P\ ...  P\~.  We  represent  player  /  Vs  individual 
planning  problem  as  a  convex  program:  choose  a  vector  p,  from  a  compact,  convex  feasible  set 
FP.  to  minimize  the  intrinsic  cost  (c;,  p;).  (In  addition  to  the  intrinsic  cost,  player  P.,  will  attempt 
to  minimize  cost  terms  which  arise  from  interactions  with  other  agents;  we  will  define  these  extra 
terms  below  and  add  them  to  TV s  objective.)  We  assume  that  the  intrinsic  cost  vector  c,  and 
feasible  region  FPi  are  private  information  for  // — that  is,  P,  may  choose  to  inform  other  players 
about  them,  but  may  also  choose  to  be  silent  or  to  lie. 

We  assume  each  feasible  set  FP.  is  a  subset  of  some  finite-dimensional  M-Hilbert  space  VPi  with 
standard  scalar  product  (•,  -)vP.-  We  write  V  =  VP]  x  ...  x  VPk  and  FP  =  FPl  x  ...  x  FPk 
for  the  overall  planning  space  and  feasible  region,  and  p  =  (pi; . . . ;  pfc)  and  c  =  (cp  . . . ;  ck) 
for  the  overall  plan  and  combined  objective.  (We  use  ;  to  denote  vertical  stacking  of  vectors, 
consistent  with  Matlab  usage.)  And,  for  any  c,  p  e  V,  we  write  (c,  p)y  =  £V(ci,  Pi)yp.  for  our 
scalar  product  on  V.  (We  omit  subscripts  on  (•,  •)  when  doing  so  will  not  cause  confusion.)  For 
convenience,  we  assume  each  feasible  set  only  contains  vectors  with  nonnegative  components;  we 
can  achieve  this  property  by  changing  coordinates  if  necessary. 

We  model  the  coupling  among  players  by  a  set  of  n  soft  linear  constraints,  which  we  interpret  as 
resource  consumption  constraints.  That  is,  we  assume  that  there  are  vectors  Iji  >  0  such  that  the 
consumption  of  resource  j  by  plan  p,  e  VPi  is  (ljp  p;).  (So,  the  total  consumption  of  resource  j  is 
(lj.  p),  where  1,  =  (T;| ; . . . ;  1  jk).)  And,  we  assume  that  there  are  monotone  increasing,  continuous, 
convex  penalty  functions  /3j{u)  and  scalars  y;  >  0  so  that  the  overall  cost  due  to  consumption  of 
resource  j  in  plan  p  is  { (1, .  p)  —  y3 ) .  In  keeping  with  the  interpretation  of  d3  as  enforcing  a 
soft  constraint,  we  assume  /3j(is)  =  0  for  all  a  <  0.  (For  example,  /3j{u)  could  be  the  hinge  loss 
function  nrax{0,  a}.)  We  define 

V?(P)  =  <lj,P)  ~Vj  (!) 

to  be  the  magnitude  of  violation  of  the  jth  soft  resource  constraint  by  plan  p.2  Because  the  resource 
constraints  are  soft,  no  player  can  make  another  player’s  chosen  action  infeasible;  conflicts  result 
only  in  high  cost  rather  than  infeasibility. 

The  function  /3j  describes  the  overall  cost  to  all  players  of  usage  of  resource  j.  We  will  assume 
that,  in  the  absence  of  any  external  coordination  mechanism,  the  cost  to  player  i  due  to  resource 

2To  enforce  a  hard  constraint,  we  could  choose  a  sufficiently  small  margin  e  >  0,  replace  yt  by  y3  —  e,  and  set 

Pj(a)  =  max{0,  v/e}. 
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j  is  given  by  some  function  fft(p)  with  YliPjii. P)  =  Pj(uj(p))-  We  will  call  3jr  the  natural 
cost  or  cost  in  nature  of  Pf  s  resource  usage.  A  typical  choice  for  3p  is  proportional  to  player  V s 
consumption  of  resource  j: 


Pji{  P) 


0  }  if  <lj  •  p)  —  0 

P))(!ji?  Pi)/(lj,  P)  ,  otherwise 


(2) 


So,  including  both  her  intrinsic  costs  and  the  natural  costs  of  resource  usage,  player  i’s  objective  is 


^(p)  =  (ci,Pi)  +  ^2Pji(  p)  • 
j 


We  will  write  cu(p)  =  ^p<( p)  for  the  social  cost;  as  stated  above,  our  goal  is  to  coordinate  the 
player  agents  to  minimize  cu(p).  (With  this  notation,  several  facts  mentioned  above  should  now 
be  obvious:  for  example,  since  the  individual  objectives  upi  depend  on  the  entire  joint  plan  p, 
the  players  cannot  simply  plan  in  isolation.  Nor  do  we  want  the  players  to  compute  and  follow 
an  equilibrium:  using  the  above  choice  for  j3ji  (which  results  in  a  setting  similar  to  so-called 
nonatomic  congestion  games),  there  are  simple  sequences  of  examples  showing  that  the  penalty 
for  following  an  equilibrium  (called  the  price  of  anarchy)  can  be  arbitrarily  large  [Rou07].) 


3  Problem  transformation 

In  this  section,  by  dualizing  our  soft  resource  constraints,  we  decouple  the  problem  of  finding  a 
socially  optimal  plan.  The  result  is  a  saddle-point  or  minimax  problem  whose  variables  are  the 
original  plan  vector  p  along  with  new  dual  variables  a,  defined  below.  By  associating  the  new 
variables  a  with  additional  agents,  called  adversarial  agents,  we  arrive  at  a  convex  game  with 
comparatively-sparse  interactions.  Based  on  this  game,  we  introduce  our  proposed  learning-based 
mechanism. 

3.1  Introduction  of  adversarial  agents 

Write  Fj Aj  =  dom  3*.  Since  we  have  assumed  that  f3j  is  continuous  and  convex,  we  know  that 
/ 3 **  =  / 3j  pointwise,  that  is,  ff{v)  =  supa?eF4  [a?v  —  (3*(aj)\  for  all  u  <G  BL  Since  (3*  will  become 
part  of  the  objective  function  for  the  adversarial  agents,  and  since  many  online  learning  algorithms 
require  compact  domains,  we  will  assume  that  FA.  =  [0,  u3]  for  some  scalar  u3 .  (For  example,  this 
assumption  is  satisfied  if  the  slope  of  f3j  is  upper  bounded  by  Uj,  and  achieves  its  upper  bound.  The 
lower  bound  of  zero  follows  from  our  previous  assumptions  that  j3j  is  monotone  and  /3j(u)  =  0  for 
v  <  0.)  We  will  also  assume  that  f3*  is  continuous  on  its  domain. 

We  define  ilA;j  :  V  x  FAj  — >  M  as 

^A  (P'  a3)  =  ^'(P)  -  Pj  (flJ)  •  (3) 

And,  writing  a  =  (a1; . . . ;  an)  G  Fa  =  FAl  x  ...  x  FAn,  we  define3 

'Note,  in  the  AAMAS  version  [CG08]  the  explicit  mention  of  the  restriction  of  O  to  feasible  plans  was  omitted. 
However,  whenever  we  considered  saddle -points  we  also  considered  them  with  respect  to  this  restricted  function 

n  :  Fp  x  Fa  — >  R. 
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q  .  j  Fp  x  Fa  ->  M 

'  1  (P,  a)  i->  (c,  p)  +  E”=1  ^  (p,  aJ) 

(note  the  inclusion  of  the  intrinsic  cost  (c,  p)).  Because  of  the  duality  identity  mentioned  above, 
along  with  our  assumption  about  dom  0*,  we  know  that  for  all  plans  p,  supaJgF4  fi^.(p,  a,j)  = 
(3j{vj( p)),  and  so 

u;(p)  =  maxfi(p,a)  ,Vp  G  Fp.  (5) 

a£FA 

Note  that  we  have  replaced  sup  by  max  in  Eq.  5:  since  £l(p,  •)  is  a  closed  concave  function,  it 
achieves  its  supremum  on  a  compact  domain  such  as  F4. 

Remark  3.1.  Note,  Eq.  5  establishes  the  connection  between  saddle-points  of  Q  and  overall  player 
plans  that  collectively  minimize  total  cost  in  nature:  //'(p.  a)  is  a  saddle-point  then  we  have 

p  =  argminpG Fp  maxfl(p,  a)  =  argminpeFpu(p).  (6) 

(Cf.  [Roc70]). 

Now,  as  promised,  we  can  introduce  the  adversarial  agents :  the  adversarial  agent  Aj  controls  the 
parameter  aj  G  FAj ,  and  tries  to  maximize  its  revenue  (l  a,  (p,  p)).  Note  that  (3j(vj(  p)) 

does  not  depend  on  af  and  so  does  not  affect  the  choice  of  a:)  once  p  is  fixed. 

To  give  Aj  this  revenue,  we  will  have  player  I)  pay  adversary  Aj  the  amount  aj{ lji,  p;)  —  0jf  p)  — 
djiTj(aj).  Here  the  remainder  function  ry  is  defined  as  Tj(aj)  =  ajyj  +  0*(aj);  the  nonnegative 
weights  dji  are  responsible  for  dividing  up  the  remainder  among  all  player  agents,  so  we  require 
dji  =  1  for  each  j.  Given  these  definitions,  it  is  easy  to  check  that  the  sum  of  all  payments  to 
Aj  is  indeed  fUj  (p,  aj)  —  Pj(vj{ p))  as  claimed. 

We  can  interpret  the  above  payments  as  follows:  Aj  sets  the  per-unit  price  a:)  for  consumption  of 
resource  j.  Pi  pays  Aj  according  to  consumption,  aj ( lji5  p:),  and  is  reimbursed  for  her  share  of  the 
actual  resource  cost,  f3jf  p).  For  the  privilege  of  setting  the  per-unit  price,  Aj  pays  a  fee  r?  (a1 ) ;  this 
fee  is  distributed  back  to  the  player  agents  according  to  weights  dji.  (We  show  in  Appendix  D  that 
the  fee  is  always  nonnegative.)  Since  the  entire  revenue  for  the  agents  Aj  arises  from  payments 
by  the  player  agents,  we  can  think  of  Aj  as  opponents  for  the  players — this  is  qualitatively  true 
even  though  our  game  has  many  players  and  even  though  the  player  agent  payoff  functions  contain 
terms  that  do  not  involve  a. 

Including  payments  to  adversaries,  Pf  s  cost  becomes 

^Pi(Pt,a)  =  (c;, Pi)  +  ^(cri(lji,pi)  -  djiTj^a3)).  (7) 

j 

By  the  above  construction,  we  have  achieved  several  important  properties: 

•  First,  as  promised,  Pf  s  cost  does  not  depend  on  any  components  of  p  other  than  p,,  and  Af  s 
revenue  does  not  depend  on  any  components  of  a  other  than  af  So,  given  an  adversarial  play 
a,  each  player  could  plan  by  independently  optimizing  VtPi  (p,.  a).  Similarly,  given  a  plan  p, 
each  adversary  could  separately  optimize  f l a,  (p,  %)•  (Aj  can  ignore  the  term  /3j{uj( p))  if  p 
is  fixed.)  So,  the  players’  optimization  problems  are  decoupled  given  a,  and  the  adversaries’ 
optimization  problems  are  decoupled  given  p. 
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•  Second,  if  an  adversarial  agent  plays  optimally,  her  revenue  will  be  exactly  zero,  since 
maxai£[0^j]  Q.i; (p.  aJ )  =  (3j{vj{ p)).  (Suboptimal  play  will  lead  to  a  negative  revenue,  i.e., 
a  loss  or  cost.) 

•  Third,  the  total  cost  to  all  player  agents  is 

^nP.(Pi,a)  =  n(p,n) .  (8) 


On  the  other  hand,  the  total  revenue  to  all  adversaries  is  Oa(p,  a)  =  1 1 ( p .  a)  —  (c,  p)  — 
p)).  If  the  adversaries  each  play  optimally,  then  Q/i(p.  a)  will  be  zero,  so  we  will 

have 

0(p,a)  =  (c,p)  +  ^^(^(p))  =  w(p)  .  (9) 

3 

Combining  Eqs.  8  and  9,  we  find  that  if  the  adversaries  play  optimally,  the  total  cost  to  the 
player  agents  is  cu(p),  just  as  it  was  in  our  original  planning  problem. 

•  Finally,  since  0(p, a)  is  a  continuous  saddle-function  on  a  compact  domain  [Roc70],  it 
must  have  a  saddle-point  (p,  a) .  (By  definition,  a  saddle-point  is  a  point  (p,  a)  such  that 
0(p,a)  >  0(p,a)  >  0(p,  a)  for  all  p  G  FP  and  a  G  FA.)  By  the  decoupling  argu¬ 
ments  above,  we  must  have  that  p,  G  argminp.ePp  QP.  (p*,a)  for  each  i,  and  that  a?  G 
arg  maxajgf4  flAj  (p,  a? )  for  each  j.  (The  latter  is  true  since  Q  and  QA  differ  only  by  terms 
that  do  not  depend  on  a.) 

3.2  Planning  as  learning  in  a  repeated  game 

If  we  consider  our  planning  problem  as  a  game  among  all  of  the  agents  Pt  and  A,,  we  have  just 
shown  that  there  exists  a  Nash  equilibrium  in  pure  strategies,  and  that  in  any  Nash  equilibrium, 
the  player  plan  p  must  minimize  u ( p )  and  therefore  be  socially  optimal.  To  allow  the  agents  to 
find  such  an  equilibrium,  we  now  cast  the  planning  problem  as  learning  in  a  repeated  game.  We 
will  show  that,  if  each  agent  employs  a  no-regret  learning  algorithm,  the  agents  as  a  whole  will 
converge  to  a  socially  optimal  plan,  both  in  the  sense  that  the  average  joint  plan  converges  and  in 
the  sense  that  the  average  social  cost  converges.  (This  result,  while  similar  to  well-known  results 
about  convergence  of  no-regret  algorithms  to  minimax  equilibrium  [FS96],  does  not  follow  from 
these  results,  in  part  because  our  game  is  not  constant-sum.)  Note  that,  from  the  individual  agent’s 
perspective,  playing  in  the  repeated  game  is  an  OCP,  and  so  using  a  no-regret  learner  would  be  a 
reasonable  choice;  we  explore  the  effect  of  this  choice  in  more  detail  below  in  Sec.  4. 

The  repeated  game  is  played  between  the  k  players  and  the  n  adversarial  agents.  Based  on  their 
local  histories  of  past  observations,  in  each  round  t,  each  player  P,  chooses  a  current  pure  strategy 
p nt)  G  Fpt,  and  simultaneously,  each  adversary  A,  chooses  a  current  resource  price  a:)(lj  G  Fa- 
We  write  p(p  =  (pqp; ...;  Pk(t))  and  a(p  —  (oL;  •••;  for  the  joint  actions  of  all  players  and 
adversaries,  respectively. 

After  choosing  p(p  and  a(p,  the  players  send  their  current  resource  consumptions  (lji,  Pi(t))  to  the 
adversaries,  and  the  adversaries  send  their  current  prices  to  the  players.  In  the  online  model,  Pi 
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observes  /%( P(q)  and  sends  it  to  Aj  as  well;  in  the  negotiation  model,  we  assume  that  fft  is  of 
the  form  given  in  Eq.  2,  so  that  AJt  can  compute  /3j-*( P(q).  The  above  information  allows  each 
Pi  to  compute  its  current  cost  function  f2p.(q(-)  =  QPi(-,a(t))  and  its  cost  ^pi(t)(Pi(t)).  It  also 
allows  each  Aj  to  compute  nAj^f)  =  Q/^fpq),  •)  as  well  as  /3j p)  =  /3j(iSj(p(t))),  and  thus,  its 
total  revenue  £lAj(t)(a m)  ~  Pj(t)-  (In  fact,  Aj  may  avoid  computing  or  storing  /3j-(t)  if  desired, 
since  it  does  not  influence  that  term  directly.)  In  Sec.  3.3  below,  we  discuss  how  to  implement  the 
necessary  communication  efficiently. 

Each  player  P%  then  adds  observation  pu-} ,  £lPi(t)  (•)  to  her  local  history,  and  each  adversary  Aj  adds 
a(t),  Q/i( (£_,(•  )  to  her  local  history.  Finally,  the  system  enters  iteration  t  +  1,  and  the  process  repeats. 

3.2.1  Game  between  two  synthesized  agents 

For  analysis,  it  will  help  to  construe  our  setup  as  a  fictitious  game  between  a  synthesized  player 
agent,  P,  and  a  synthesized  adversarial  agent,  A.  When  each  component  agent  P,  plays  p,(t) 
and  each  component  agent  Aj  plays  a3(ty  then  we  imagine  P  to  play  p(t)  and  A  to  play  a^.  Ac¬ 
cordingly,  we  understand  P  to  incur  cost  fi(p^,a^),  and  A  to  have  revenue  QA(P(t),  &(t))  ~ 
Ej  fij(iyj(P(t)))  'n  round  t.  These  synthesized  agents  are  merely  theoretical  notions  serving  to 
simplify  our  reasoning;  in  practice  there  would  never  be  a  single  agent  controlling  all  players. 
Using  these  synthesized  agents,  we  will  prove  two  results:  first,  immediately  below,  we  show  that 
if  the  individual  agents  use  no-regret  algorithms,  then  the  synthesized  agents  also  achieve  no  regret. 
And  second,  in  Sec.  3.2.2,  we  show  that  if  the  synthesized  agents  achieve  no  regret,  then  they  will 
converge  to  an  equilibrium  of  the  game,  in  the  two  senses  mentioned  above. 

Lemma  3.2.  If  each  individual  agent  Pi  achieves  regret  bound  AP.(T),  then  the  synthesized  player 
agent  P  achieves  regret  bound  A P(T)  :=  ^  Ap,(T).  So,  if  A P.(T)  e  o{T)  for  all  i,  then 
A  P(T)eo(T). 

Proof.  By  definition,  the  regret  RP{T )  for  agent  P  is 

Rp(T)  =  Eh  n(p it),  a(q)  -  minP  Eh  fi(P’  am) 

and  the  regret  for  Pt  is  RPt  ( T )  <  AP.(T): 

RpXT )  =  Eh  ^PihPiit))  ~  minPi  Eh  fiPi(t)(p») 

Owing  to  the  decoupling  effect  of  the  adversary  we  have  0(p,  a(t))  =  Op.(t)  (p,).  So,  we  can 
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expand  Rp(T)  as 


T  T 

E  n(p(*)>  aw)  -  min  E  fi(p>  aw) 

- '  PS  tp  i - ' 

t= 1  t=l 

T  k  T  k 

=  E  E  qaw  (p<w)  -  E  E  (pO 

t=l  t=l  P  P  t=l  i=l 

k  T  k  T 

=  E  E  n aw  (p<w )  -  E  aii>  E  A(t)  (p<) 

4=1  t=  1  4=1  P*  Pi  t=  1 

k  T  T 

=  E(E  nAw  (P<w)  -  E  nAw  (Pi)) 

2—1  t=l  Pi  t=  1 

fc  k 

=  5  EAf,<(r>  ■ 

2—1  2=1 

So,  Rp(T)  e  o(T)  as  desired.  □ 

Analogously,  for  the  adversarial  agent  A  we  have  the  following  lemma.  The  proof  is  very  similar, 
and  is  therefore  omitted. 

Lemma  3.3.  If  each  adversarial  agent  Aj  achieves  regret  bound  Aa.  (T),  then  the  synthesized 

agent  A  achieves  regret  bound  A a(T)  :=  JA  A Aj(T).  So,  if  Aa.  (T)  e  o(T)  for  all  j,  then 

Aa(T)  e  o(T). 

3.2.2  Social  optimality 

In  this  section,  we  investigate  the  behavior  of  the  averaged  strategies  p[T]  :=  y  i  P(*)  and 

a[T]  :=  y  YlJ=i  a(«)’  as  we^  as  the  averaged  costs  y  Y^t= i  ^(P(d>  a(t))-  (RecaU  that  the  negoti¬ 
ation  version  of  our  mechanism  outputs  the  averaged  strategies,  while  the  online  version  of  our 
mechanism  incurs  the  averaged  costs.) 

Starting  with  the  averaged  strategies,  we  show  that  if  all  players  achieve  no  regret,  we  can  guarantee 
convergence  of  (p[T],  aprj)  to  a  set  KP  x  KA  of  saddle-points  of  Q  (where  KP  C  FP,  KA  C  FA). 
(While  the  sequences  pm  and  am  may  not  converge,  the  distance  of  p -7- 1  from  KP  and  the  distance 
of  a[T]  from  KA  will  approach  zero;  and,  every  cluster  point  of  the  sequence  (ppr],  apr])  will  be  in 
KP  x  Ka.  Due  to  the  convexity  and  compactness  of  the  feasible  sets,  each  average  strategy  and 
each  cluster  point  will  be  feasible.) 

Theorem  3.4.  Let  ppr]  =  y  Ylt= 1  P(t>  a[T]  =  y  X^t=i  a(t)  be  the  averaged,  pure  strategies  of 
synthesized  player  P  and  adversary  A,  respectively.  If  P,  A  each  suffer  sublinear  external  regret, 
then  as  T  — >  00,  (pprpaprj)  converges  to  a  ( bounded )  subset  KP  x  KA  of  saddle-points  of  the 
player  cost  function  Q  ;  FP  x  FA  M. 

Proof  Refer  to  Appendix  C.  □ 
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Since  f)  is  continuous  on  its  domain,  Thm.  3.4  lets  us  conclude  that  the  outcome  of  negotiation  is  a 
plan  which  approximately  minimizes  total  player  cost  in  nature:  if  we  choose  T  sufficiently  large, 
then  (ppr],  apr])  must  be  close  to  a  saddle-point,  and  so  the  costs  must  be  close  to  the  costs  of  the 
of  a  saddle-point.  (To  determine  how  large  we  need  to  choose  T,  we  can  look  at  the  regret  bounds 
of  the  learning  algorithms  of  the  individual  agents.)  As  expressed  in  Rem.  3.1,  the  player  part  p 
of  a  saddle-point  incurs  minimal  total  player  cost  in  nature  and,  since  the  adversarial  agents  learn 
to  set  prices  as  nature  would,  p  is  socially  optimal  (w.r.t.  to  the  player  agents)  with  respect  to  both 
nature  and  the  mechanism. 

If  we  choose  to  run  our  mechanism  in  online  mode,  we  also  need  bounds  on  the  average  incurred 
cost.  Since  cu(p)  =  maxa  Qfp.  a),  the  following  theorem  tells  us  that  the  average  social  cost 
approaches  the  optimal  social  cost  in  the  long  run. 

Theorem  3.5.  IfP  and  A  suffer  sublinear  external  regret,  then  as  T  — >  oo, 

P  Ef=i  ^(P (t),  a(t))  -»•  minp  maxa  H(p,  a)  . 

Proof.  Refer  to  Appendix  C.  □ 

3.3  Communication  costs 

So  far  we  have  assumed  that  all  player  agents  broadcast  their  resource  usages  (and  possibly  their 
natural  costs)  to  all  adversarial  agents,  and  all  adversarial  agents  broadcast  their  prices  to  all  player 
agents.  With  this  assumption,  on  every  time  step,  each  player  sends  one  broadcast  of  size  0(n)  (her 
resource  usages)  and  receives  n  messages  of  size  0(1)  (the  resource  prices),  while  each  adversary 
sends  one  broadcast  of  size  0(1)  and  receives  k  messages  of  size  0(n),  for  a  total  of  n  +  k 
broadcasts  per  step,  and  a  total  incoming  bandwidth  of  no  more  than  0(nk)  at  each  agent.4  Even 
under  this  simple  assumption,  the  cost  is  somewhat  better  than  a  centralized  planner,  which  would 
have  to  receive  k  much  larger  messages  describing  each  player’s  detailed  optimization  problem, 
and  send  k  much  larger  messages  describing  each  player’s  optimal  plan. 

However,  by  exploiting  locality ,  we  can  reduce  bandwidth  even  further:  in  many  problems  we 
can  guarantee  a  priori  that  player  Pi  will  never  use  resource  j,  and  in  this  case,  we  never  need  to 
transmit  Af  s  price  a1  to  Pi.  Similarly,  if  Pi  decides  not  to  use  resource  j  on  a  given  trial,  we  never 
need  to  transmit  <lji?  pi(t) )  to  Ar  (To  take  full  advantage  of  locality,  we  must  also  set  the  weights 
dji  so  that  players  do  not  receive  payments  from  adversaries  they  would  otherwise  not  need  to  talk 
to.)  So,  by  using  targeted  multicasts  instead  of  broadcasts,  we  can  confine  each  player’s  messages 
to  a  small  area  of  the  network;  in  this  case,  no  single  node  or  link  will  see  even  (){k  +  n)  traffic. 
We  can  sometimes  reduce  bandwidth  even  further  by  combining  messages  as  they  flow  through 
the  network:  for  example,  two  resource  consumption  messages  destined  for  Aj  may  be  combined 
by  adding  their  reported  consumption  values. 

Finally,  any  implementation  needs  to  make  sure  that  the  agents  cannot  gain  by  circumventing  the 
mechanism:  e.g.,  no  player  should  find  out  another’s  plan  before  committing  to  her  own.5. 

4Technically,  the  agents  could  multicast  rather  than  broadcast,  so  that,  e.g.,  one  player  would  never  see  another 
player’s  messages,  but  in  practice  one  would  not  expect  this  optimization  to  save  much. 

5 One  of  the  undesirable  effects  resulting  if  we  would  permit  agents  to  wait  until  all  other  agents  have  sent  their 
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4  Design  goals 

In  designing  our  mechanism,  we  hope  to  ensure  that  individual,  incentive-driven  behavior  leads 
to  desirable  system-wide  properties.  Here,  we  establish  some  useful  guarantees  for  the  negotia¬ 
tion  version  of  our  mechanism.  The  guarantees  are  convergence  to  Nash  equilibrium,  budget 
balance,  individual  rationality,  and  efficiency. 

These  guarantees  follow  from  the  fact  that  the  learned  negotiation  outcome  approaches  a  set  of 
saddle -points  of  fl(p,  a)  in  the  limit  (Thm.  3.4).  By  continuity  of  Q,  we  can  therefore  conclude 
that,  if  we  allow  sufficient  time  for  negotiation,  the  negotiation  outcome  is  approximately  a  saddle- 
point.  (We  will  not  address  distributed  detection  of  convergence,  but  merely  assume  that  we  use 
our  global  regret  bounds  to  calculate  a  sufficiently  large  T  ahead  of  time;  obviously  efficiency 
could  be  improved  by  allowing  early  stopping.) 

Convergence  to  Nash  equilibrium.  When  working  with  selfish,  strategic  agents,  we  want  to 
know  whether  a  selfish  agent  has  an  incentive  to  unilaterally  deviate  from  its  part  of  the  negotiation 
outcome.  The  following  theorems  show  that  the  answer  is,  at  least  approximately,  no:  in  the  limit 
of  large  T,  the  negotiation  outcomes  pp’pap’j  converge  to  a  subset  of  Nash  equilibria.  So,  by 
continuity,  (pm,  am)  is  an  approximate  Nash  equilibrium  for  sufficiently  large  T — that  is,  each 
agent  has  a  vanishing  incentive  to  deviate  unilaterally. 

Theorem  4.1.  Let  FP.  denote  the  feasible  set  of  player  agent  Pi  (i  G  {1, ...,  k })  and  FAj  denote 
the  feasible  set  of  adversarial  agent  Aj  (j  G  {l,...,n}).  We  have: 

ViVp '  G  Fp.Va  G  Ka,  p  G  KP  :  HP.(p',  a)  >  fiP.(pi,  a). 

Proof  Let  p  G  KP ,  a  G  KA .  We  know  (p,  a)  is  a  saddle -point  of  O  with  respect  to  minimization  over  FP 
and  maximization  over  FA.  Hence,  Vp'  G  FP  :  ft(p,a)  <  fi(p',  a).  Since  fi(p,  a)  =  Em  a),  Vp,  a, 

we  have  in  particular:  Vp'  G  FPi  :  Q P.  (p'j ,  a)  +  Em/* n (Pm ,  a)  =  H(p',p^,a)  >  H(p,a)  = 
fiPi(Pi.a)  +  Em^i^JPm.a).  Thus,  HPi(p',a)  >  nPfpuF), 

Vp'  G  FPi.  □ 

Theorem  4.2.  Let  (p,  a)  be  a  saddle-point  of  Hip.  a)  with  respect  to  minimizing  over  p  and  max¬ 
imizing  over  a.  We  have:  HAj(dfp)  =  maxa.,eP4  LlAj  (af  p),  Vj  G  { 1 . ,  n }. 

Proof.  Since  Y!j=i  ^ Aj  (aj ,  p)  =  fU(a,  p)  =  maxa  HA(a,  p) 

=  maxaeP4  Y!j= t  ^Aj{oP,  p)  =  E”=i  ma xazeFA.  LiAj(afp),  we  have  E"=t(ma ^azeFA.  [^Afaf  p)]  - 
HAj  (afp))  =  0.  On  the  other  hand,  Vj  :  maxaJgfl  [0Ai(aJ,p)]  -  nAj(dfp)  >  0.  Hence,  Vj  : 
maxat  £FA.  Aj  (aJ ,  p)]  -  (a0 ,  p)  =  0.  □ 

Theorem  4.2  allows  us  to  conclude  that  the  individual  part  of  an  adversarial  agent’s  negotiation 
outcome  is  a  best-response  action: 

plans  before  choosing  an  own  plan  is  that  we  could  run  into  a  deadlock.  In  general,  we  could  enforce  a  commitment 
to  a  certain  plan  prior  to  learning  about  the  other  agents’  current  choices  by  employing  cryptographic  methods  such  as 
commitment  schemes  [Blu81,  Eve82,  Nao91]. 
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Corollary  4.3.  Let  FA  denote  the  feasible  set  of  adversarial  agent  A}  (j  G  {1,  and  FP. 

denote  the  feasible  set  of  player  agent  Pi  (i  G  {1, k}).  We  have: 


VjVA  G  FAjV p  G  Kp,  a  G  KA  :  flAj(a3,  p)  <  ClAj(d3,  p). 


Budget  balance.  Since  our  overall  goal  is  a  socially  optimal  plan,  we  would  hope  that  our  mech¬ 
anism  neither  siphons  off  money  from  the  agents  by  running  a  surplus,  nor  requires  continuous 
investment  to  fund  a  deficit.  This  is  the  question  of  budget  balance.  Since  the  agents  make  pay¬ 
ments  only  to  one  another  (and  not  directly  to  the  mechanism),  in  one  sense  our  mechanism  is 
trivially  budget  balanced.  However,  a  more  interesting  question  is  whether  the  mechanism  is  bud¬ 
get  balanced  if  we  consider  the  adversarial  agents  to  be  part  of  the  mechanism — this  additional 
property  guarantees  that  the  adversarial  agents  do  not,  in  the  long  run,  siphon  off  money  or  require 
external  funding.  Since  we  showed  (in  Sec.  3.1)  that  the  adversarial  agents  each  have  zero  rev¬ 
enue  at  any  saddle  point,  and  since  the  outcome  of  negotiation  is  an  approximate  saddle  point,  our 
mechanism  is  (approximately)  budget  balanced  in  this  sense  as  well. 

Budget  balance  can  be  evaluated  ex  ante,  ex  interim,  or  ex  post,  depending  on  whether  it  holds 
(in  expectation)  before  the  agents  know  their  private  information,  after  they  know  their  private 
information  but  before  they  know  the  outcome  of  the  mechanism,  or  after  they  know  the  outcome 
of  the  mechanism.  Ex-post  budget  balance  is  the  strongest  property;  our  argument  in  fact  shows 
approximate  ex-post  budget  balance. 

Individual  rationality.  Strategic  agents  will  avoid  participating  in  a  mechanism  if  doing  so  im¬ 
proves  their  payoffs.  A  mechanism  is  individually  rational  if  each  agent  is  no  worse  off  when 
joining  the  mechanism  than  when  avoiding  it.  Just  as  with  budget  balance,  we  can  speak  of  ex- 
ante,  ex-interim,  or  ex-post  individual  rationality. 

To  make  the  question  of  individual  rationality  well-defined,  we  need  to  specify  what  happens  if  an 
agent  avoids  the  mechanism.  If  an  adversarial  agent  refuses  to  participate,  we  will  assume  that  her 
corresponding  resource  goes  unmanaged:  no  price  is  announced  for  it,  and  the  player  agents  pay 
their  natural  costs  for  it.  The  adversarial  agent  therefore  gets  no  revenue,  either  positive  or  negative. 
If  a  player  agent  refuses  to  participate,  we  will  assume  that  she  is  constrained  use  no  resources, 
that  is,  (lji,  pi)  =  0  for  all  j.  (So,  we  assume  that  there  is  a  plan  satisfying  these  constraints.) 

Since  we  showed  that  supaj  (p.  a3)  —  /3j(i/,-(p))]  =  0  (in  Sec.  3.1),  A;  has  (approximately)  no 
incentive  to  avoid  the  mechanism  when  we  play  an  (approximate)  saddle  point.  So,  the  mechanism 
is  approximately  ex-post  IR  for  adversaries. 

If  a  player  agent  does  not  participate  in  the  mechanism,  she  has  no  chance  of  acquiring  any  re¬ 
sources.  Since  she  would  not  have  to  pay  for  joining  and  using  no  resources  (the  remainder  rj(a ?) 
is  nonnegative),  it  is  irrational  not  to  join.  So,  the  mechanism  is  ex-post  IR  for  players. 

Efficiency.  A  mechanism  is  called  efficient  if  its  outcome  minimizes  global  social  cost.  Thm.  3.4 
showed  that  the  mechanism  finds  an  approximate  saddle-point  of  fi.  We  showed  in  Sec.  3.1  that,  in 
any  saddle-point,  the  player  cost  is  minpcu(p)  (the  socially-optimal  cost),  and  the  adversary  cost 
is  0.  So,  in  an  approximate  saddle-point,  the  social  cost  is  approximately  optimal;  the  mechanism 
is  therefore  approximately  efficient. 
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5  Related  Work 


The  idea  of  using  no-regret  algorithms  to  solve  OCPs  in  order  to  accomplish  a  planning  task  is 
not  new  (e.g.,  [BBCM03]).  It  has,  for  instance,  been  proposed  for  online  routing  in  the  Wardrop 
setting  of  multi-commodity  flows  [BEDL06],  where  the  authors  established  convergence  to  Nash 
equilibrium  for  infinitesimal  agents.  In  contrast  with  this  line  of  work,  we  seek  globally  good 
outcomes,  rather  than  just  equilibria. 

Another  body  of  related  work  is  concerned  with  selfish  routing  in  nonatomic  settings  (with  in¬ 
finitesimal  agents — e.g.,  [BEDL06,  Rou07]).  Many  of  these  works  provide  strong  performance 
guarantees  and  price  of  anarchy  results  considering  selfish  agents.  We  consider  a  similar  but  not 
identical  setup,  with  a  finite  number  of  agents  and  divisible  resources. 

As  mentioned  before,  our  planning  approach  can  be  given  a  simple  market  interpretation:  in¬ 
teraction  among  player  agents  happens  indirectly  through  resource  prices  learned  by  the  adver¬ 
saries.  Many  researchers  have  demonstrated  experimental  success  for  market-based  planners  (e.g., 
[SD99,  GM02,  SDZK04,  Wel93,  GK07,  MWY04]).  While  these  works  experimentally  validate 
the  usefulness  of  their  approaches  and  implement  distributivity,  only  a  few  provide  guarantees  of 
optimality  or  approximate  optimality  (e.g.,  [LMK+05,  Wel93]). 

Guestrin  and  Gordon  proposed  a  decentralized  planning  method  using  a  distributed  optimization 
procedure  based  on  Benders  decomposition  [GG02].  They  showed  that  their  method  would  pro¬ 
duce  approximately  optimal  solutions  and  offered  bounds  to  quantify  the  quality  of  this  approx¬ 
imation.  However,  as  with  most  authors,  they  assumed  agents  to  be  obedient ,  i.e.,  to  follow  the 
protocol  in  every  aspect.  By  contrast,  we  address  strategic  agents,  i.e.,  selfish,  incentive-driven 
entities  prone  to  deviating  from  prescribed  behavior  if  it  serves  their  own  benefit.  But,  since  the 
trick  of  dualizing  constraints  to  decouple  an  optimization  problem  is  analogous  to  Benders  de¬ 
composition,  we  can  view  our  mechanism  as  a  generalization  of  Guestrin  and  Gordon’s  method  to 
decentralized  computation  on  selfish  agents. 

Designing  systems  that  provably  achieve  a  desired  global  behavior  with  strategic  agents  is  exactly 
the  field  of  study  of  classic  mechanism  design.  Many  mechanisms,  though,  are  heavy-weight  and 
centralized,  and  are  concerned  neither  with  distributed  implementation  nor  with  computational 
feasibility.  Attempting  to  fill  this  gap,  a  new  strand  of  work  under  the  label  distributed  algorithmic 
mechanism  design  has  evolved  [FPS01,  FPSS05,  Wel93]. 

Our  approach  combines  many  advantages  of  the  above  branches  of  work  for  multiagent  planning. 
It  is  distributed,  and  provides  asymptotic  guarantees  regarding  mechanism  design  goals  such  as 
budget  balance  and  quality  of  the  learned  solution.  If  we  consider  the  adversarial  agents  to  be  in¬ 
dependent,  selfish  entities  that  are  not  part  of  the  mechanism,  the  proposed  mechanism  is  relatively 
light-weight;  it  merely  offers  infrastructure  for  the  participating  agents  to  coordinate  their  planning 
efforts  through  learning  in  a  repeated  game.  And,  as  the  following  section  shows,  its  theoretical 
guarantees  translate  into  reliable  practical  performance,  at  least  in  our  small-scale  network  routing 
experiments. 
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6  Experiments 

We  conducted  experiments  on  a  small  multi-agent  min-cost  routing  domain.  We  model  our  network 
by  a  finite,  directed  graph  with  edges  E  (physical  links)  and  vertices  V  (routers).  Each  edge  e  7  E 
has  a  finite  capacity  7(e),  as  well  as  a  fixed  intrinsic  cost  ce  for  each  unit  of  traffic  routed  through 
e.  We  assume  that  bandwidth  is  infinitely  divisible. 

Players  are  indexed  by  a  source  vertex  s  and  a  destination  vertex  r.  Player  Psr  wants  to  send  an 
amount  of  flow  dsr  from  s  to  r.  Psr’s  individual  plan  is  a  vector  fsr  =  (/ser)eeS,  where  /'(),  e  [0,  U] 
is  the  amount  of  traffic  that  Psr  routes  through  edge  e.  (U  is  an  upper  bound,  chosen  ahead  of  time 
to  be  larger  than  the  largest  expected  flow.)  Feasible  plans  are  those  that  satisfy  flow  conservation, 
i.e.,  the  incoming  traffic  to  each  vertex  must  balance  the  outgoing  traffic.  We  also  excluded  plans 
which  route  flow  in  circles. 

If  the  total  usage  of  an  edge  e  exceeds  its  capacity,  all  agents  experience  an  increased  cost  for  using 
e.  This  extra  cost  could  correspond  to  delays,  or  to  surcharges  from  a  network  provider.  We  set 
the  global  penalty  for  edge  e  to  be  a  hinge  loss  function  (3e[y)  =  max{0,  ueu},  so  the  total  cost  of 
e  increases  linearly  with  the  amount  of  overuse.  For  convenience  we  also  modeled  each  player’s 
demand  dsr  as  a  soft  constraint  fdsr{y)  =  max{0,  usru},  although  doing  so  is  not  necessary  to 
achieve  a  distributed  mechanism. 

Applying  our  problem  transformation  led  to  new,  total  player  cost  f)(f ,  a)  =  Y^s  r  cefesr  + 
cap>f)  +  Yls  r  (aj,  f )  in  the  mechanism.  Here,  for  each  edge  e,  we  introduced 

an  adversarial  agent  Ae,  who  controls  the  cost  for  capacity  violations  at  e  by  setting  the  price 
a®ap.  And,  as  a  slight  extension  to  our  general  description  in  Section  3,  we  introduced  additional 
adversarial  agents  to  implement  the  soft  constraints  on  demand;  agent  Asr  chooses  a  price  asJ  for 
failing  to  meet  demand  on  route  sr. 

With  this  setup,  we  ran  more  than  2800  simulations,  for  the  most  part  on  random  problem  instances, 
but  also  for  manually-designed  problems  on  graphs  of  sizes  varying  between  2  and  16  nodes.  In 
each  instance  there  were  between  1  and  32  player  agents.  For  no-regret  learning,  we  used  the 
Greedy  Projection  algorithm  [Zin03]  with  (^)teN  as  the  sequence  of  learning  rates. 

A  simple  example  of  an  averaged  player  plan  after  a  number  of  iterations  is  depicted  in  Figure  2(b). 
In  this  experiment,  we  had  a  6-node  network  and  three  players  A, 3,  P\a,  and  P46  with  demands 
30,  70  and  110.  We  set  c^2,3\  c^3,2)  =  10,  and  ce  =  1  for  all  other  edges  e.  Edges  (5,  6)  and  (6,  5) 
had  capacities  of  50,  while  all  other  capacities  were  100.  Our  method  successfully  discovered  that 
P4i 6  should  send  as  much  flow  as  possible  through  the  cheap  edge  (5,  6),  and  the  rest  along  the  ex¬ 
pensive  path  through  (3,  2).  Adversarial  agents  successfully  discouraged  the  players  from  violating 
the  capacity  constraints,  while  simultaneously  making  sure  that  as  much  demand  as  possible  was 
satisfied.  Also  note  that  player  P2)3  served  the  common  good  by  (on  average)  routing  flow  through 
the  pricey  edge  (2,  3)  instead  of  taking  the  path  through  the  bottleneck  (6,  5);  this  latter  path  would 
have  been  cheaper  for  /  2  :>  if  we  didn’t  consider  the  extra  costs  imposed  by  the  adversarial  agents. 
The  plots  in  Figs.  1  and  2  validate  our  theoretical  results:  Figs.  l(a),(b)  demonstrate  that  the  re¬ 
grets  of  the  combined  agents  P  and  A  converge  to  zero,  as  shown  in  Fern.  3.2  and  Fern.  3.3. 
Fig.  2(a)  demonstrates  convergence  to  a  saddle-point  of  fi.  In  the  plot,  the  upper  curve  shows 
maxg  H(f[r],  a),  while  the  lower  curve  shows  miiif  O (f ,  a^).  The  horizontal,  dashed  line  is  the 
minimax  value  of  Q  (which  was  30  in  the  corresponding  experiment).  As  guaranteed  by  Thm.  3.4, 
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(a)  (b) 

Figure  1:  Average  positive  regret  of  synthesized  player  (a)  and  synthesized  adversary  (b),  averaged 
over  100  random  problems,  as  a  function  of  iteration  number.  Grey  area  indicates  standard  error. 


the  three  curves  converge  to  one  another.  While  the  fact  of  convergence  in  these  figures  is  not  a 
surprise,  it  is  reassuring  to  see  that  the  convergence  is  fast  in  practice  as  well  as  in  theory. 


7  Discussion 

We  presented  a  distributed  learning  mechanism  for  use  in  multiagent  planning.  The  mechanism 
works  by  introducing  adversarial  agents  who  set  taxes  on  common  resources.  By  so  doing,  it 
decouples  the  original  player  agents’  planning  problems.  We  then  proposed  that  the  original  and 
adversarial  agents  should  learn  about  one  another  by  playing  the  decoupled  planning  game  repeat¬ 
edly,  either  in  reality  (the  online  setup)  or  in  simulation  (the  negotiation  setup). 

We  estabhshed  that,  if  all  agents  use  no-regret  learning  algorithms  in  this  repeated  game,  several 
desirable  properties  result.  These  properties  included  convergence  of  p[T],  the  average  composite 
plan,  to  a  socially  optimal  solution  of  the  original  planning  problem,  as  well  as  convergence  of 
P[r]  and  the  corresponding  adversarial  tax-plan  a  7-]  to  a  Nash  equilibrium  of  the  game.  We  also 
showed  that  our  mechanism  is  budget-balanced  in  the  limit  of  large  T. 

So  far,  we  do  not  know  in  what  cases  our  mechanism  is  incentive-compatible;  in  particular,  we 
do  not  know  when  it  is  rational  for  the  individual  agents  to  employ  no-regret  learning  algorithms. 
Certainly,  we  can  invent  cases  where  it  is  not  rational  to  choose  a  no-regret  algorithm,  but  we  be¬ 
lieve  that  there  are  practical  situations  where  no-regret  algorithms  are  a  good  choice.  Investigating 
this  matter,  and  modifying  the  mechanism  to  ensure  incentive  compatibility  in  all  cases,  is  left  to 
future  work. 

Compared  to  a  centralized  planner,  our  method  can  greatly  reduce  the  bandwidth  needed  at  the 
choke-point  agent.  (The  choke-point  agent  is  the  one  who  needs  the  most  bandwidth;  in  a  central¬ 
ized  approach  it  is  normally  the  centralized  planner.)  In  very  large  systems,  agents  P%  and  A;J  only 
need  to  send  messages  to  one  another  if  Pj  considers  using  resource  j,  so  we  can  often  use  locality 
constraints  to  limit  the  number  of  messages  we  need  to  send. 
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(b) 


Figure  2:  (a):  Payoff  Q  for  the  synthesized  player  (upper  curve)  and  adversary  (lower  curve) 
when  P  and  A  play  their  averaged  strategy  against  a  best-response  opponent  in  each  iteration. 
Horizontal  line  shows  minimax  value,  (b):  Average  plan  for  three  agents  in  a  6-node  instance  after 
10000  iterations,  rounded  to  integer  flows.  The  displayed  plan  is  socially  optimal. 


Our  method  combines  desirable  features  from  various  previous  approaches:  like  centralized  mech¬ 
anisms  and  some  other  distributed  mechanisms  we  can  provide  rigorous  guarantees  such  as  social 
optimality  and  individual  rationality.  But,  like  prior  work  in  market-based  planning,  we  expect 
our  approach  to  be  efficient  and  implementable  in  a  distributed  setting.  Our  experiments  tend  to 
confirm  this  prediction. 
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A  Background 


A.l  Convex  Sets  and  Convex  Functions 

In  this  subsection  we  will  list  a  couple  of  basic  definitions  and  theorems  regarding  convex  functions 
for  later  use.  The  material  was  extracted  from  [BV04]  and  [Roc70]. 

Definition  A.l  (Affine  Function).  A  mapping  0  between  vector  spaces  V)  and  V2  is  called  affine 
iff  3a  G  hom(\ i,  V2),  r  G  G  Vj  :  0(vi)  =  a(vx)  +  r. 

Definition  A.2  (Affine  Sets).  A  C  Rd  is  Val5  a2  G  aMl  gR:  (1  -  l)a[  +  fa2  G  d. 

We  can  see  that  an  affine  set  is  a  line  through  any  two  of  its  points  by  rewriting  (1  —  /)ai  +  Za2 
as  a!  +  /(a2  —  a,  ).  This  idea  can  be  generalized  to  multiple  points  leading  to  the  idea  of  an  affine 
hull. 

Definition  A.3  (Affine  Hull).  Let  S  C  Md  be  a  subset  of  Wl.  Its  affine  hull  aff  S  is  the  smallest 
(with  respect  to  the  inclusion  operator  C)  affine  set  containing  S. 

Note,  aff  S  =  {^“=i  r*s*ls*  e  SiY!i=iri  =  1  ,n  e  N}  (cf.  [Roc70]). 

Instead  of  asking  that  for  any  two  points  of  a  set  the  whole  line  through  them  is  contained  in  the 
set,  we  could  be  interested  in  sets  with  the  property  that  only  all  points  on  the  line  segment  between 
any  two  of  its  points  need  to  be  contained  as  well.  This  leads  to  the  important  notion  of  convex 
sets. 

Definition  A.4  (Convex  Sets).  A  set  of  vectors  C  C  Md  is  convex  Vc^  c2  G  CM l  G  [0, 1]  : 
(1  —  /)ci  +  /c2  G  C. 

As  an  example,  we  will  describe  the  polyhedrons.  Polyhedrons  are  solution  sets  of  a  collection  of 
linear  inequalities  and  equalities  [BV04].  An  example  for  a  polyhedron  is  the  set 
{v  G  Rd|(li,v)  <  7/i, ...,  (ln,  v)  <yn,(e,v)  =  a:}.  Bounded  polyhedrons  are  usually  called  poly¬ 
topes.  The  convex  hull  of  a  set  (the  intersection  of  all  convex  supersets)  is  a  polyhedron  [Roc70]. 
Following  standard  literature  we  will  usually  consider  functions  0  of  the  form  0  :  S  C  Md  — > 
M  U  {— oo,  +oo}.  We  follow  the  customary  convention  to  understand  by  — ,  oo  +  oo  mathematical 
objects  with  the  property  Mx  G  M  :  —  oo  <  x  <  +oo.  By  this  point  of  view  a  distinction  between 
0’s  domain  S  and  its  effective  domain  dom  0  :=  (s  G  Sj0(s)  <  +oo}  is  obtained.  Instead  of 
considering  functions  on  S  we  will  extend  them  to  all  of  W1  by  setting  the  their  values  at  all  points 
in  Md  not  in  S  to  +oo.  This  has  technical  benefits  which  will  not  become  apparent  in  this  work  but 
it  is  done  to  make  our  assumptions  consistent  with  the  ones  needed  for  theorems  we  will  borrow 
from  [Roc70]. 

As  we  will  see,  another  way  to  define  the  effective  domain  of  a  function  is  by  its  epigraph  -  the  set 
of  all  points  lying  on  or  above  its  graph. 

Definition  A.5  (Epigraph).  The  epigraph  epi  0  of  a  function  0  :  Rd  — >  MU  {— oo,  +oo}  is  epi  0  :  = 

|(s,r)  G  Md  x  M|r  >  0(s)}. 
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Note,  points  in  the  epigraph  only  have  real  components,  i.e.  points  with  —  oo  or  +oo  components 
are  not  included.  We  can  now  define  dom  (f>  :=  {s  e  Md|3r  6  1  :  (s,  r)  €  epi  fa). 

Definition  A.6  (Convex  Function).  A  real-valued  function  is  convex  iff  its  epigraph  is  a  convex 
set. 

Definition  A.7  (Concave  Function).  A  real-valued  function  is  called  concave  iff  -f  is  convex. 

Theorem  A.8  (Jensen’s  Inequality  for  Convex  Functions(Cf.  [BV04])).  If  ©  is  a  convex  function 
and  z\, ...,  zn  are  positive  real  numbers  such  that  Z\  +  ...  +  zn  =  1,  then: 

Vxi, ...,  xn  e  dom  f  :  faz^i  +  •••  +  znxn)  <  zfafaf)  +  ...  +  znfaxn). 

By  multiplying  both  sides  of  the  inequality  by  —1  we  can  conclude: 

Corollary  A.9  (Jensen’s  Inequality  for  Concave  Functions).  Iff  is  a  concave  function  and  zl7 ...,  zn 
are  positive  real  numbers  such  that  z\  +  ...  +  zn  —  1,  then: 

Vx!, ...,  x„  G  dom  f  :  faz^i  +  ...  +  znxn)  >  zfafaf)  +  ...  +  znffan). 

Remark  A.  10.  It  is  to  be  noted  that  an  affine  function  is  both  convex  and  concave.  In  particular, 
Jensen’s  inequality  holds  in  both  directions,  i.e.  Jensen’s  inequality  is  actually  an  equality  for  affine 
functions. 

Remark  A.l  1.  dom  f  results  from  the  epigraph  by  projection  on  Wl  which  is  a  linear  transforma¬ 
tion.  Since  it  is  known  that  linear  transformations  conserve  convexity  (cf.  [Roc70])  the  effective 
domain  of  a  convex  function  must  be  convex. 

Theorem  A.12  (Cf.  [Roc70]).  The  pointwise  supremum  of  an  arbitrary  collection  of  convex  func¬ 
tions  is  convex. 

Proof  The  proof  is  following  [Roc70].  Let  I  denote  an  index  set,  o  f  t  e  I)  a  collection  of  convex 
functions.  <j>(x)  sup  {fa(x)\i  e  1}  =>■  epi  f  =  fhepi  fa.  It  is  known  that  the  intersection  of 
convex  sets  results  in  a  convex  set  again.  □ 

In  order  to  be  able  to  state  the  next  theorem  we  will  need  to  introduce  more  terminology. 

Definition  A.13  (Proper  Convex  Function).  A  convex  function  f  is  called  proper  iff  dom  f  f  0 

and  Vx  e  dom  f  :  fax)  >  —oo. 

We  could  say  proper  convex  functions  represent  the  non-pathological  case,  they  are  the  type  of 
functions  we  would  normally  consider. 

Definition  A.14  (Relative  Interior).  Let  C  be  a  convex  set.  Its  relative  interior  ri  C  is  defined  is  its 
interior  points  relative  to  aff  C:  ri  C  =  {x  e  C\3r  >  0  :  /J (x.  r)  D  aff  C  C  C}. 

The  motivation  for  introducing  the  concept  of  relative  interior  points  can  be  best  understood  by  an 
example.  Consider  a  line  segment  in  M3.  It  does  not  have  truly  interior  points  in  the  whole  metric 
space  M3 .  However  the  points  between  its  two  delimiting  end  points  are  interior  points  relative  to 
its  affine  hull  M1. 

Note,  if  we  denote  the  closure  of  C  by  cl  C  we  have  :  ri  C  C  C  C  cl  C  C  cl  (aff  C)  =  aff  C 
([Roc70]). 
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Theorem  A.15.  Let  f  be  a  proper  convex  function  and  let  S  be  any  closed,  bounded  subset  of 
ri  ( dom  <i>).  Then  0  is  Lipschitzian  relative  to  S. 

Proof  Confer  to  [Roc70].  □ 

A.2  Saddle  Points  and  Game-Theoretic  Essentials 

This  work  uses  some  game-theoretic  notions  that  will  be  briefly  listed  in  this  section.  For  a 
proper  game-theoretic  introduction  the  reader  is  advised  to  refer  to  related  textbooks  such  as 
[Owe95,  MJHA05].  The  game-theoretic  terminology  we  need  for  our  exposition  is  very  basic 
and  furthermore,  our  reduction  to  the  synthesized  agents  allows  us  to  restrict  our  attention  to  the 
case  of  two-player  games.  A  central  aspect  in  two-player  games  are  minimax-equilibria  which 
correspond  to  the  general  notion  of  saddle-points. 

A.2.1  Saddle-points 

Minimax  theory  treats  a  class  of  optimization  problems  which  involve  not  just  maximization  or 
minimization,  but  a  combination  of  the  two.  We  will  confine  our  considerations  to  the  case  of 
continuous  functions  on  compact  domains.  A  treatment  of  the  more  general  case  can  be  found  in 
[Roc70].  Let  f  :  X  x  Y  — >  M  where  X,  Y  are  compact  sets. 

We  can  consider  a  minimization  problem  for  function  max,ygy  0(x,  •)  :  X  — >  M  and  a  maximiza¬ 
tion  problem  for  function  minxex  <?(•,  y)  :  Y  — >  M. 

It  is  well  known,  that  we  always  have  maxyey  minxex  0(x,  y)  <  minxex  maxyey  0(x,  y). 

If  the  values  minxex  maxyey  0(x.  y)  and  maxyGy  minxex  </>(x,  y)  coincide,  the  common  value  v 
is  called  minimax-  or  saddle-value  of  function  0  (with  respect  to  minimizing  over  A"  and  maximiz¬ 
ing  over  Y). 

Definition  A.16  (Saddle -point).  A  point  (x,  y)  G  X  x  Y  is  a  saddle-point  (with  respect  to  mini¬ 
mizing  over  A"  and  maximizing  over  Y)  if 

Vx  G  X\/y  G  Y  :  0(x,y)  <  0(x,y)  <  0(x,y). 

An  adaptation  of  Lemma  36.2  in  [Roc70]  is: 

Lemma  A.17.  (x,  y)  G  A"  x  Y  is  a  saddle-point  of  0  (with  respect  to  minimizing  over  X  and 
maximizing  overY)  iffy:  =  argminxex  maxyey  0(x,  y),  y  =  argmaxyey  minxex  0(x,  y)  and 
the  minimax  value  v  exists,  //'(x.  y)  is  a  saddle-point  then  0(x,  y)  =  v. 

A.2.2  Some  fundamental  game-theoretic  notions 

Many  game-theoretic  problems  can  be  reduced  to  a  specific  type  of  game  called  normal  —  form 
game. 

Definition  A.18  (cf.  [SLBon]).  A  finite  n-player  normal  form  game  is  a  tuple  (P,N,A,0,/i,Ctj), 
where  P  is  a  finite  set  of  k  players  indexed  by  i,  A  =  (Ay ...,  Af),  where  Ai  is  a  finite  set  of 
actions  (or  synonymously  pure  strategies )  available  to  the  7th  player.  Each  a  =  (ay...,  an)  G  A 
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is  called  action  profile.  O  denotes  a  set  of  outcomes,  p  :  A  — »  O  maps  an  action  to  an  outcome, 
w  =  (m\, ...,  u7/,j  is  the  ordered  set  of  individual  payoff  functions,  where  s,  :  O  — >  E  determines 
the  individual  payoff  for  the  zth  player. 

Note,  a  more  wide-spread  definition  of  normal-form  games  avoids  the  introduction  of  the  outcome 
set  O  and  directly  maps  actions  to  payoffs.  We  can  consider  this  as  a  special  case  of  our  definition 
by  setting  O  :=  A  and  //  to  the  identity  mapping.  Unless  otherwise  stated,  we  assume  to  work 
with  this  latter  configuration.  Also  note,  we  can  easily  extend  the  definition  to  handle  infinite  (but 
compact)  action  sets  as  will  be  required  for  our  treatment. 

Agents  are  generally  not  required  to  always  play  the  same,  fixed  action.  Instead  they  may  be 
allowed  to  randomize  over  their  action  set.  A  distribution  over  an  individual  action  set  A;  is  called 
mixed  strategy.  A  pure  strategy  can  be  considered  as  a  special  case  of  a  mixed  strategy.  Many 
times,  agents  playing  actions  generated  according  to  mixed  strategies  can  have  higher  expected 
payoffs  than  if  they  would  restrict  themselves  to  deterministic  behavior.  The  ordered  set  of  all 
mixed  strategies  s  =  (si, ...,  sk)  of  all  agents  is  called  strategy  profile. 

In  game  theory,  it  is  standard  to  assume  agents  playing  the  game  act  rational,  i.e.  their  behavior 
is  governed  by  the  desire  to  maximize  their  individual  payoff  functions.  Given  the  other  agents’ 
strategy  profile  s-,*  it  is  in  the  best  interest  of  the  ith  rational  player  to  play  a  best  response  s*  to  it. 
The  best  response  s*  is  given  by  s*  =  argmaxs.zz7j(sj,  s-f). 

Certain  types  strategy  profiles  that  can  happen  to  be  played  are  of  special  interest  to  strategic 
agents.  A  famous  one  is  the  so-called  Nash- equilibrium. 

Definition  A.19  (Nash-equilibrium).  A  strategy  profile  s  —  (sq, ...,  sfe)  is  a  Nash-equilibrium,  if 
for  each  i  e  {1, ...,  k}  we  have  w §_,*)  <  zzq(sj,  «_,*). 

In  other  words,  a  Nash-equilibrium  is  a  strategy  profile  where  each  player  plays  best-response  to 
the  strategies  of  all  other  players.  Hence,  if  every  player  knew  that  the  current  strategy  profile  is  a 
Nash-equilibrium,  no  single  player  would  have  an  incentive  to  unilaterally  deviate  from  its  current 
strategy. 

As  a  special  case,  we  now  consider  two-player  (normal-form)  games.  Such  a  game  is  called  zero- 
sum,  if  zz7i(si,  s2)  =  *-zz72(si,  s2)Vs i,  s2.  Therefore,  we  can  redefine  say  player  l’s  desire  to  maxi¬ 
mize  its  payoff  by  stating  that  its  goal  is  to  minimize  its  loss  given  by  w  :=  zzj2  instead. 

A  famous  result,  called  Minimax-Theorem  or  von  Neumann ’s  Theorem ,  found  by  J.  von  Neumann 
states  that  in  every  Nash-equilibrium  of  a  finite,  two-player  zero-sum  game  the  minmax-value  is 
attained  and  equals  the  maxmin- value  and  that  in  every  such  a  game,  a  Nash-equilibrium  exists. 
So,  a  Nash-equilibrium  in  a  two-player  zero-sum  game  is  given  by  a  saddle-point  (sq,  s2)  of  w  ( 
which  directly  follows  from  Definition  A.  16  or  with  the  Minimax-Theorem  in  conjunction  with 
Lemma  A.  17).  Since  w(s\,  s2)  equals  the  minimax  value  such  a  saddle -point  forming  the  Nash- 
equilibrium  is  commonly  referred  to  as  a  minimax-equilibrium  of  the  game. 
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B  Supplementary  prerequisites 

Next,  we  will  establish  some  properties  needed  in  Appendix  C.  We  do  not  suppose  they  are  novel 
since  the  material  seems  standard.  However,  we  provide  our  own  derivations. 

Lemma  B.l.  Let  Q  :  X  x  Y  — >  M  be  a  continuous  mapping  (on  X  x  Y  )  where  X,  Y  are  compact 
subsets  of  Hilbert-spaces.  We  have: 

T'  :  Y  — >  M,  y  i— ►  infxex  ^(x,  y)  is  continuous. 

Proof.  Due  to  compactness  we  have  'T(y)  =  minxex  H(x,  y),  Vy.  Let  (yn)  be  a  sequence  in  Y 
converging  to  y  G  Y  as  n  — >  oo.  We  wish  to  show  that  lim^oo  \l/(yn)  =  d/(y).  Before  we 
proceed  with  the  proof  we  need  to  establish  two  prerequisites. 

(i)  Let  (a„)  be  a  sequence  in  H  and  a  G  II  where  H  is  a  Hilbert-space  with  norm  ||  -|| . 

CLAIM  1:  If  every  subsequence  of  (an)  contains  a  subsequence  converging  to  a  then  (a„)  itself 
converges  to  a. 

Proof  ( CLAIM  1):  Let  an  a.  Hence,  3e  >  OVn  G  N3m(n)  >  n  :  ||am(n)  —  a||  >  e.  Then  we  can 
define  the  subsequence  (a,m(n))  where  ||am(n)  —  a||  >  e,  Vn  G  N.  This  subsequence  of  (an)  does 
not  contain  a  subsequence  converging  to  a.  q.e.d. 

(ii)  Let  (y„)  be  a  sequence  in  Y  converging  to  y  G  Y  as  n  oo  and  (an)  be  the  sequence  in  M 
defined  as  an  =  T  (yn) ,  Vn  G  N. 

CLAIM 2:  There  exists  a  subsequence  of  (an)  converging  to  T(y). 

Proof  (CLAIM2):  Define  an  arbitrary  sequence  (x„)  such  that  Q(xn,yn)  =  minxex  H(x,  yn),  Vn  G 
N.  Note,  such  a  sequence  exists  (owing  to  compactness  of  X )  and  we  have  Vn  G  N  :  T(y„  )  = 

H(xn,y„). 

Since  X  x  Y  is  a  compact  set  in  a  complete  metric  space,  there  exists  a  convergent  subsequence 
«,  Yn)  of  sequence  (xn,  yn).  Let  x  G  X  such  that  (x'n,  y^J  ->  (x,  y). 

We  have  :  Vv  G  X  :  H(v,  y)  =  lim„_N  D(v,  y^)  >6  limn^N  D(x^,y^)  =  7  H(x,y).  Hence, 
H(x,y)  =  minveXD(v,y)  =  'T(y). 

Define  (a'n)  where  a!n  :=  Tfy'j.  We  know  (a'n)  is  a  subsequence  of  (an).  Now,  we  have 
lim^oc  a!n  =  lirn^oo  ^(y'J  =  lim,woo  Q(x'n,y'n)  =  H(x,y)  =  'T(y).  q.e.d. 

The  final  argument  completing  the  proof  is  the  following:  If  (a’n)  is  any  subsequence  of  (an)  = 
(T(yn))  then  there  is  a  subsequence  (y^)  of  (yn)  still  converging  towards  y  such  that  a'n  = 
'Yy'nfVn  G  N.  Sequence  (a'n)  meets  the  preconditions  of  (ii)  and  thus,  we  know  that  there 
is  a  subsequence  of  subsequence  (a'n)  which  converges  towards  a  =  T'(y). 

Hence,  every  subsequence  of  (an)  contains  a  subsequence  that  converges  to  a.  By  (i),  we  know 
that  \h(yn)  =  an  — >  a  —  H/(y)  as  n  — >  oo. 

□ 

6Since  by  definition  x.'n  £  argminxe xfl (x,  y'n). 

7By  continuity. 


22 


Providing  an  analogous  argument  one  can  prove  the  following  lemma: 

Lemma  B.2.  Let  12  :  X  x  Y  — >•  M  be  a  continuous  mapping  (on  X  x  Y  )  where  X,Y  are  compact 
subsets  ofHilbert-spaces.  We  have: 

(f)  :  supygy  12  (x,  y)  is  continuous. 

Proof.  The  proof  is  completely  analogous  to  the  one  provided  for  Lemma  B.l  and  shall  be  omitted 
here.  □ 


C  Derivation  of  Theorems  3.4  and  3.5 

In  Section  3.2.2,  we  presented  Theorems  3.4  and  3.5.  Theorem  3.4  established  the  convergence 
of  the  averaged  plans  p[T]  =  4  Ylt=i  P(t)  and  3l[t]  =  ^  Ylt= i  a(t)  t0  a  set  Kp  x  Ka  of  feasible 
saddle  points  of  the  overall  social  player  cost  function  12  -  an  insight  that  served  as  a  linchpin  for 
the  ensuing  theoretical  argumentation  regarding  properties  of  the  negotiation  setup.  Both  theorems 
enabled  us  to  establish  the  connection  of  the  original  planning  problem  to  the  planning  outcome 
spawned  by  our  mechanism  and  allowed  us  to  infer  social  optimality  results. 

For  the  sake  of  a  coherent  exposition  we  decided  to  omit  the  respective  proofs  in  Section  3.2.2. 
This  gap  shall  be  closed  now. 

C.l  Maxmin  and  minmax  inequalities 

Before  commencing  with  the  proofs  we  need  to  establish  some  preliminary  results.  As  always,  we 
assume  that  all  agents  employ  no-regret  learning  and  thus,  the  synthesized  agents  P  and  A  have 
sublinear  regret  bounds  A  p  and  A  a,  respectively. 

Lemma  C.l.  Let  pp-j  :=  4  Pop  a[r]  :=  y  YlJ=i  a(t)  be  the  respective  average  strategies 
until  time  step  T. 

We  have: 


and 


minmaxf2(p,  a)  <  max  12 (pm,  a)  <  maxmin  I2(p,  a)  + 
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Proof.  minp  maxa  12 (  p.  a)  <  maxa  Q(p -y|.  a) 
=  maxa12(4^f=lPw,a) 

<8  maxa  4  Y^=i  fi(P(t),  a) 

(•, a)  convex  Va. 
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<9£ELi^(p  (t),aw)  +  ^) 

<10  minp  4  Y,t=i  a(*))  + 

<11minpn(p,a[T])  +  ^  +  ^ 

<  maxa  minp  Q(p,  a)  +  Aj^  +  AaP  □ 

Remark  C.2.  The  proof  of  the  above  Lemma  was  inspired  by  Freund  and  Shapire’s  alternative  proof 
for  von  Neumann  \s  minimax  theorem  which  they  presented  in  [FS96].  They  considered  a  zero-sum 
matrix  game  between  a  no-regret  learning  player  and  a  best-response  environment  (adversary). 
Their  payoff-function  model  allowed  them  to  leverage  bi-linearity  in  their  derivations.  In  contrast, 
we  consider  both  player  and  adversary  to  be  no-regret  learners  and  assume  player  cost  to  be 
merely  continuous  and  convex-concave.  (In  addition,  we  tolerate  the  (synthesized)  adversary  A  to 
have  a  slightly  different  revenue  function  Q  ,i(  p.  a)  =  Q(  p.  a)  —  (c,  p}).12 

Remark  C.3.  It  is  standard  knowledge  that  for  any  real-valued  function  k  on  a  nonempty  product 
set  X  x  Y,  we  have  supxeX  infyey  /c(x,  y)  <  infyey  supxgX  /c(x,  y)  [Roc70].  For  our  case,  this 
fact  translates  to  maxaeF4  minpe^p  fi(p,  a)  <  minpeFp  maxaeFl  f2(p,  a),  since  we  work  with 
compact  sets  and  continuous  functions. 

We  now  have  the  means  to  show  that  the  minmax-  and  the  maxmin  value  coincide.  Such  a  result 
was  first  proven  in  von  Neumann’s  well-known  minimax  theorem  (  [vN28])  for  the  case  of  two- 
player,  zero-sum  matrix  games  (Theorem  C.4). 

Theorem  C.4.  Let  0  :  Fp  x  F\  — >  R  he  the  payoff  function  of  the  two-  player  game  between 
synthesized  player  P  and  adversary  A  (cf  Sec.  3.2). 

Then  we  have:  maxa  minp  Q(p.  a)  =  minp  maxa  Q(p.  a). 

Proof.  By  construction  of  Q  we  have:  Va  e  Fa  :  H(-,a)  :  Fp  — >  R  convex  and  Vp  e  Fp  : 
f)(p,  •)  :  Fa  — >  M  concave. 

We  can  leam  to  play  the  game  employing  our  no-regret  learning  mechanism  (  with  one  synthesized 
player  agent  P  and  one  synthesized  adversarial  agent  A).  We  can  then  apply  Lemma  C.l  to  obtain 
the  inequality  minp  maxa  Q(p,  a)  <  maxa  minp  Q(p,  a)  +  App  +  AAp .  Letting  T  — >  oo 
we  can  conclude  minp  maxa  f2(p,  a)  <  maxa  minp  f2(p,  a).  From  Lemma  C.3  we  also  know 
maxa  minp  ( 1  ( p .  a)  <  minp  maxa  Q(p.  a).  □ 

C.2  Proof  of  Theorem  3.5 

With  the  help  of  the  inequalities  found  above,  we  are  able  to  prove  Theorem  3.5.  Remember,  it 
stated  that 

1  T 

—  ^2  a(t))  min  max  fl(p,  a)  (as  T  -»■  oo). 

t.= i  p  a 

9Lemma  3.3. 

10Lemma  3.2. 
nO(p,  •)  concave  Vp. 

12Remember,  the  actual  payoff  of  the  entire  collective  of  adversarial  agents  is  fi^p.a)  —  12j=i  Pj(uj(p))-  How¬ 
ever,  since  the  player  penalty  (vj  (p))  in  nature  for  overconsumption  of  resource  j  is  not  controllable  by  adversarial 
agents  we  could  construe  the  game  to  be  played  by  an  adversary  whose  payoff  function  was  given  by  Qa- 
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if  P  and  A  incur  no  regret. 

Proof.  From  the  inequalities  in  the  proof  of  Lemma  C.  1  we  know: 


min  max  f2(p,  a)  < 

P  a 


1 

T 


T 

^n(p(t),a(t)) 

t= i 


A  a(T) 

T 


<  maxminfl(p,  a)  + 

a  p 


A  p(T) 
T 


■  A  a(T) 
T 


Considering  AP,  A  4  G  o(T)  and  the  result  from  Theorem  C.4  completes  the  proof. 


□ 


C.3  Proof  of  Theorem  3.4 

Theorem  3.4  played  a  central  role  in  our  theoretical  considerations.  Remember,  it  stated  the  fol¬ 
lowing: 

If  synthesized  player  P  and  synthesized  adversary  A  suffer  sublinear  external  regret,  the  average 
strategies  ppr]  =  y  Y^t= 1  P(t)  °f  P  and  apr]  =  ^  Y^t=i  aM  °f  A  converge  to  a  subset  of  saddle- 
points  of  objective  function  Q  (as  T  — >  00). 

Proof.  Remember,  FP  C  V)  FA  C  M"  denote  the  compact,  convex  feasible  sets  of  the  synthesized 
player  P  and  adversary  A,  respectively.  Before  proceeding,  we  establish  some  notation.  Let 
v  :=  maxa  minp  0(p.  a)  .  From  Theorem  C.4  we  know  that  also  v  =  minp  maxa  fl(p,  a).  Let  the 
set  of  all  saddle-points  of  be  denoted  by  E. 

E  =  {(p,  a)  G  FP  x  Fa |Vp  G  FpV a  G  FA  :  f2(p,  a)  <  f2(p,  a)  <  fl(p,  a)}  is  the  set  of  all  feasi¬ 
ble  saddle  points  of  (cf.  [Roc70])  which  correspond  to  the  minimax-equilibria  of  the  game. 

Let  KP  :=  {p  G  C|Ve  >  0,  t  G  N3T  >  t  :  ||p  —  P[t]||  <  e\  be  the  set  of  all  cluster  points  of 
sequence  (p [t])tsn.  Ka  {a  G  Mn|Ve  >  0,  t  G  N3T  >  t  :  [|a  —  sl[t\\\  <  e}  the  set  of  all  cluster 
points  of  sequence  (a [t])tgn  •  We  could  restate  the  definition  and  say  KP  consists  of  all  vectors 
being  the  limit  of  a  subsequence  (sT)TeN  of  (p[t])tgn-  The  same  way,  KA  consists  of  all  vectors 
that  are  limit  of  a  subsequence  (uT)TeN  of  (a[T])TgN.  Since  we  operate  in  compact  subsets  of 
Hilbert-spaces  we  know  KP  C  FP,  KA  C  FA  and  KA,  I\  P  f  0  A  FP,  F  \  f  0. 

The  sets  KP ,  KA  are  of  great  interest  because  the  sequences  (p [t])tsn,  (a [t])tsn  converge  towards 
them,  respectively.  With  abuse  of  notation,  we  can  write  linir^oc  pm  =  Kp,  lini  r—oc  am  =  KA. 

In  order  to  prove  the  claim  of  this  theorem,  we  will  show  K P  x  Ka  C  E,  i.e.  (p,  a)  G  E,  Vp  G 
Kp,  a  G  Ka. 

To  do  so,  it  suffices  to  show 

Vp  G  KP,  a  G  K  \  :  max  C(p.  a)  <  ff(p,a)  <  min 0(p,  a). 

a  p 


We  introduce  the  well-defined  functions:  :  FA  — >  M,  a  minp  f2(p,  a)  and  0  :  FP  — >  M,  p  1— ► 
max,,  f2(p,  a)  and  note,  Va,  p  :  0(a)  <  ^(P,a)  <  0(p). 

Combining  the  the  well-known  maxmin  inequality  (cf.  Remark  C.3)  and  the  results  from  Lemma 
C.l,  we  conclude  for  all  T  G  N: 
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maxa  minp  0(p.  a) 

<  maxa  0(p[T],  a) 

<  maxa  miiip  f)(p,  a)  +  Aj^r)  +  Aa^t) 

and 

maxa  minp  i  l  ( p .  a) 

<  minp  H(p,  a[T])  + 

<  maxa  minp  H(p,  a)  +  Ap^T)  +  Aa£tK 

Hence,  the  following  limits  exist  and  are  as  follows: 

lim  0( pm)  =  lim  maxO(pm,a)  =  maxminfl(p,  a)  —v  (10) 

T— >oo  L  J  T— >oo  a  a  p 

v  =  maxminfi(p,  a)  =  lim  ruin  Q(p,  am)  =  lim  (i(am).  (11) 

a  p  T— xx)  p  T— >oo 

Let  p  G  A'p,a  G  AV  be  arbitrary  choices.  By  definition  of  the  cluster  sets  there  exists  a  sub¬ 
sequence  (s t)t  of  (P[t])t  and  a  subsequence  (u t)t  of  (a[T])r,  such  that  :  sT  — >  p  A  uT  — >  a 
(T  — >  oo).  We  know  Q,  7,  A  are  continuous  (cf.  Sec.  3  and  Appendix  B,  respectively).  Hence, 
v  =  limT^oo'0(a[T])  =  linip^oo -0(u[Tj)  =  0(a).  Analogously,  we  obtain:  v  =  (f>( p).  Hence, 
v  =  <f>(  p)  ^  \/ci  and  thus,  v  ^  H(p,a)  >  "0(a)  =  v.  (Since  p,  a  were  arbitrary  choices 

we  conclude  Cl(Kp,  Ka)  =  {r>}.) 

With  the  help  of  Eq.  10  and  Eq.  1 1,  we  can  now  conclude  that  (p,  a)  is  a  saddle  point: 

maxa H(p,  a)  =  0(p)  —  v  —  fi(p,  a)  =  v  =  "0(a)  =  minp  fi(p,  a). 

Since  p,  a  were  arbitrary  choices  we  conclude  KP  x  KA  C  E.  □ 


D  Nonnegativity  of  remainder  fee  rj 

As  always,  let  aj  be  A:)  \s  plan  and  let  p  denote  the  overall  player  plan.  In  Sec.  3.1,  we  described 
how  each  player  agent  Pt  transfers  a  payment  aJ (1,; .  p;)  —  iffp)  —  d^rfjA)  to  each  adversarial 
agent  Aj.  Furthermore,  we  mentioned  that  the  magnitude  of  the  remainder  77  {aJ)  -  a'.ijj-  Aj(a') 
was  always  nonnegative.  We  will  proof  this  claim  now.  As  a  first  step,  we  show  that  conjugation 
reverses  inequalities: 

Lemma  D.l.  Let  d  G  N,  0  :  — »  M  U  {  —  00,  cx)},  7  :  — >  M  U  {— 00,  00}  be  convex  functions. 

IfVx  G  :  0(x)  <  7(x),  then  Vy  G  :  0*(y)  >  7*(y). 

Proof.  According  to  the  premise  we  have  Vx  :  0(x)  <  7(x)  and  consequently,  Vx  :  —  0(x)  > 
— t(x).  Hence,  Vy  :  <j>*{ y)  =  supx(x,y)  -  0(x)  >  supx(x,y)  -  7(x)  =  7*(y).  □ 

We  can  now  show  the  desired  result13: 

Lemma  D.2  (Nonnegativity  of  remainder  rf).  For  cdl  aj  >  0,  we  have  rfa3)  >  0. 

13Remember,  the  feasible  set  was  always  contained  in  the  positive  half-space,  i.e.  Fa  C  M" .  Therefore,  we  only 
need  to  consider  nonnegative  taxes  a1 
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Proof.  Since  af  yj  >  Owe  only  have  to  show  that  f*(aJ)  >  0.  In  Sec.  3.1,  we  required  ff(x)  < 
0}\/x  G  M.  Thus,  fij{x)  <  x(x)  where  x(x)  —  ^  X  ~  ° 


, ,  is  the  convex  indicator  function 

oo,  otherwise 

of  the  non-positive  half-space. 

According  to  Lemma  D.l  we  have  \/y  G  M  :  (3*(y)  >  x*(y)-  Refering  to  the  definition  of  a  convex 

'  0,  y  >  0 


function’s  conjugate  it  is  easy  to  verify  that  x*(y)  = 
Hence,  W  >  0  :  fi*(aJ)  >  0. 


cx),  otherwise 


□ 


E  Hard  Inter- Agent  Constraints 

Throughout  the  paper  we  assumed  that  all  inter-agent  constraints  (resource  constraints)  in  nature 
were  soft.  That  is,  player  agents  can  violate  the  constraints  as  much  as  they  like  -  as  long  as  they 
are  willing  to  pay  the  increased  price  for  such  a  behavior.  In  the  light  of  our  resource  interpretation 
of  these  constraints  this  translates  to  the  property  of  nature  that  resources  are  unlimited  but  tend 
to  become  arbitrarily  expensive  with  increasing  demand  once  a  certain  level  of  overall  resource 
consumption  is  exceeded.  As  an  example  for  a  plausible  domain  where  this  model  may  be  ac¬ 
curate,  consider  our  network  routing  example  where  the  resources  are  bandwidth-limitations  on 
the  communication  links.  In  a  conceivable  scenario,  we  could  imagine  that  once  overall  usage 
of  a  particular  link  threatens  to  exceed  the  maximal  bandwidth,  the  network  provider  rents  addi¬ 
tional  bandwidth  from  competitors  who  charge  at  expensive  rates  monotonically  increasing  with 
the  rented  bandwidth.  Alternatively,  the  extra  cost  for  resource  constraint  violation  could  encode 
the  delay  due  to  congestion. 

However,  one  may  wish  to  apply  our  method  to  planning  in  environments  with  hard  constraints, 
i.e.  in  scenarios  where  a  soft  constraint  model  is  not  applicable  but  where  the  available  resources 
are  strictly  limited. 

In  such  settings  the  individual  objective  functions  are  linear  and  the  descriptions  of  the  feasible 
sets  of  each  player  agent  contain  the  constraints  {uj( p)  <  0}j=|  n  where  z/;(p)  =  (lj.  p)  —  yj  (cf. 
Eq.  1).  In  other  words,  instead  of  having  rninPi  ujPi  (p)  s.t.  :  pj  G  FPi  as  its  individual  convex 
optimization  problem,  each  player  agent  Pi  should  optimally  solve: 

min(ci,  p;)  s.t.  :  pt  G  FPi  IT  Hp^.  (12) 

Pi 

where  Hp^  :=  {p,j/q  (p)  <  0, ...,  vn{ p)  <  0}.  Note,  subscript  p  -,  indicates  II p  ,  is  parameterized 
by  p-,,  being  the  overall  plan  of  all  players  other  than  P \.  The  socially  optimal  solution  can  be  found 
as  the  solution  to  the  optimization  problem: 

min(c,  p)  s.t.  :  p  G  Fp  IT  H  (13) 

p 

where  H  :=  (p|^i(p)  <  0, ...,  i/„(p)  <  0}. 

While  our  coordination  mechanism  needs  to  model  the  inter-agent  constraints  as  being  soft  our 
mechanism  is  also  suitable  to  achieve  approximate  coordination  in  the  hard  constraint  scenario. 


27 


That  is,  it  can  be  set  up  such  that  the  overall  player  part  p  of  a  negotiation  outcome  is  an  approxi¬ 
mate  solution  of  optimization  problem  (13). 

To  achieve  this,  we  simply  need  to  define  an  artificial  penalization  function  / 3j  for  each  hard  inter¬ 
agent  constraint  Vj( p)  <  0  and  assign  an  adversarial  agent  Aj  to  it  as  before  (j  =  1, ,  n ).  While 
the  true  cost  in  nature  is  (c,  p)  we  invoke  the  negotiation  setup  with  the  augmented  social  cost 
function  cc(p)  =  (c,  p)  +  /3j(uj( p))  and  feasible  set  Fp.  After  learning  in  the  repeated  game  is 

concluded,  we  know  that  the  negotiation  outcome  p  is  optimal  with  respect  to  the  regularized  social 
cost  function  we  defined,  i.e.  we  have  p  =  argminpeFpcc(p)  =  argminpeFj)(c,  p)  +  JA  (3j{vj( p)). 
In  order  to  make  sure  that  this  solution  is  approximately  optimal  with  respect  to  optimization 
problem  (13)  we  need  to  assert  that  mmpGFpnH(c,  p)  ~  (c,  p)  and  that  dist( p,  Fp  fl  H)  <  e  for 
some  e  >  0  which  we  predefine  to  encode  the  extend  to  which  we  consider  constraint  violations  to 
be  negligible. 

Obviously,  we  can  accomplish  this  by  defining  regularization  functions  / 3j  to  penalize  constraint 
violations  accordingly.  A  simple  yet  perfectly  suitable  choice  for  f3j  may  be  the  prominent  hinge- 
loss  function  which  satisfies  all  the  required  model  assumptions  such  as  continuity,  convexity  and 
monoticity.  That  is  we  define  p))  =  max(0,  u:j  Uj{ p))  where  u:)  is  a  parameter  defining  the 
slope  determining  how  much  each  unit  of  constraint  violation  is  penalized. 


28 


Ml 

MACHINE  LEARNING 
DEPARTMENT 

Carnegie  Mellon  University 
5000  ForbesAvenue 
Pittsburgh,  PA  15213 


Carnegie  Mellon. 

Carnegie  M  ellon  University  does  not  discriminate  and  Carnegie  M  el  I  on  University  is 
required  not  to  discriminate  in  admission,  employment,  or  administration  of  its  programs  or 
activities  on  the  basis  of  race,  color,  national  origin,  sex  or  handicap  in  violation  of  Title  VI 
of  the  Civil  Rights  A  ct  of  1964,  Title  IX  of  the  Educational  Amendments  of  1972  and  Section 
504  of  the  Rehabilitation  Act  of  1973  or  other  federal,  state,  or  local  laws  or  executive  orders. 

In  addition,  Carnegie  M  ellon  University  does  not  discriminate  in  admission,  employment  or 
administration  of  its  programs  on  the  basis  of  religion,  creed,  ancestry,  belief,  age,  veteran 
status,  sexual  orientation  or  in  violation  of  federal,  state,  or  local  laws  or  executive  orders. 
However,  in  the  judgment  of  the  Carnegie  M  ellon  Human  Relations  Commission,  the 
Department  of  Defense  policy  of,  "Don't  ask,  don't  tell,  don't  pursue,"  excludes  openly  gay, 
lesbian  and  bisexual  students  from  receiving  ROTC  scholarships  or  serving  in  the  military. 
Nevertheless,  all  ROTC  classes  at  Carnegie  M  ellon  University  are  available  to  all  students. 

Inquiries  concerning  application  of  these  statements  should  be  directed  to  the  Provost,  Carnegie 
Mellon  University,  5000  ForbesAvenue,  Pittsburgh  PA  15213,  telephone  (412)  268-6684  or  the 
Vice  President  for  Enrollment,  Carnegie  M  ellon  University,  5000  ForbesAvenue,  Pittsburgh  PA 
15213,  telephone  (412)  268-2056 


Obtain  general  information  about  Carnegie  M  ellon  University  by  calling  (412)  268-2000 


